This is the output for the comparison sc.250425.K562.IGVF.ignore_TPM. Following analyses evaluate how well the experimental data agrees with the predictions of CRE - gene pairs. Following input files were used:

Experimental data: EPCrisprBenchmark_ensemble_data_GRCh38.intGENCODEv43.1Mb.tsv.gz
Predictions: encode_e2g_predictions.tsv.gz, encode_e2g_predictions.tsv.gz, encode_e2g_predictions.tsv.gz, encode_e2g_predictions.tsv.gz, encode_e2g_predictions.tsv.gz, encode_e2g_predictions.tsv.gz, pairs.STARE.res.tsv.gz, pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz

Following parameters in the config file config/pred_config.250411.K562.cv.ignore_TPM.txt were used to overlap predictions with experimental data and to assess performance of predictors. If no config file was provided, this was generated using default values. It’s strongly recommended to use a prediction config file to control how predictors should be treated.


1 Overlap between predictors and CRISPR data

The number of CRISPR enhancer-gene pairs that overlapped enhancer-gene pairs for each predictor are counted. CRISPR enhancer-gene pairs that did not overlap any predicted pairs, are considered not predicted. Large fractions of CRISPR E-G pairs not overlapping predictions lead to poor performance.

K562


Number of CRISPR enhancer-gene pairs overlapping enhancer-gene pairs in predictions.

Number of CRISPR enhancer-gene pairs overlapping enhancer-gene pairs in predictions.



2 Precision-Recall performance

Precision-recall (PR) curves are used for comparing the performance of different predictors on the experimental data. The area under the PR curve (AUPRC) provides a single metric of a predictors performance.

2.1 Precision-recall curves

K562


Precision-recall curves for all predictors in all matching experimental cell types. Dots represent alpha cutoff values as specified in pred_config file. If no alpha was set, the minium alpha in predictions was taken by default, respectively the maximum for inverse predictors. Distance to TSS was added as baseline predictor and computed from the provided 'gene universe'.

Precision-recall curves for all predictors in all matching experimental cell types. Dots represent alpha cutoff values as specified in pred_config file. If no alpha was set, the minium alpha in predictions was taken by default, respectively the maximum for inverse predictors. Distance to TSS was added as baseline predictor and computed from the provided ‘gene universe’.


2.2 Performance summary

Precision-recall performance summary for predictors. Table shows Area-under-the-PRC (AUPRC) and precision at specified thresholds (if specified) and minimum sensitity (recall) of 0.7.


3 Receiver Operating Characteristic performance

Receiver Operating Characteristic (ROC) curves are an alternative method to compare performance by computing true positive rates and false positive rates for each predictor.

K562


ROC curves for all predictors in all matching experimental cell types. Distance to TSS and nearest genes/TSS were added as baseline predictors and computed from the provided 'gene universe'.

ROC curves for all predictors in all matching experimental cell types. Distance to TSS and nearest genes/TSS were added as baseline predictors and computed from the provided ‘gene universe’.



4 Effect size vs predictors

Each predictor listed in the prediction data is plotted against the effect size of enhancer perturbations reported in the experimental data (e.g. percent change in expression). These plots show how well a predictor is associated with effects observed in CRISPRi enhancer screens in an intuitive way.

K562


Predictors versus CRISPRi effect size. Effect size is defined as percent change in target gene expression upon CRISPRi perturbation of an enhancer. Effect size values are taken from 'EffectSize' column in experimental data, while predictor scores correspond to scores from prediction files. Numbers show Spearman's rank correlation coefficient (rho) between effect size and predictor scores.

Predictors versus CRISPRi effect size. Effect size is defined as percent change in target gene expression upon CRISPRi perturbation of an enhancer. Effect size values are taken from ‘EffectSize’ column in experimental data, while predictor scores correspond to scores from prediction files. Numbers show Spearman’s rank correlation coefficient (rho) between effect size and predictor scores.



5 Predictor scores versus experimental outcome

The scores of each predictor is compared between experimental positives and negatives to get another assessment of how well it distinguishes true enhancer - gene pairs from negatives.

K562


Predictor scores vs. experimental outcome for all predictors. Each point represents one E-G pair in the experimental data. Cases where the predictor value is 0 or infinite might correspond to E-G pairs that were not found in predictions and predicor values were filled in according to the prediction config file

Predictor scores vs. experimental outcome for all predictors. Each point represents one E-G pair in the experimental data. Cases where the predictor value is 0 or infinite might correspond to E-G pairs that were not found in predictions and predicor values were filled in according to the prediction config file



6 Performance as function of distance to TSS

Enhancer-gene pairs are binned based on their distance to TSS and predictor performance is assessed for each bin.

6.1 AUPRC as function of distance

K562


Area under the Precision-Recall Curve (AUPRC) for different distance to TSS bins (kb).

Area under the Precision-Recall Curve (AUPRC) for different distance to TSS bins (kb).


6.2 Precision-recall curves

K562


Precision-Recall curves for different distance to TSS bins (kb).

Precision-Recall curves for different distance to TSS bins (kb).


6.3 Predictor scores versus experimental outcome

K562


Predictor scores versus experimental outcome for different distance to TSS bins (kb).

Predictor scores versus experimental outcome for different distance to TSS bins (kb).


6.4 Effect size vs predictors

K562


CRISPR effect size vs predictor scores for different distance to TSS bins (kb).

CRISPR effect size vs predictor scores for different distance to TSS bins (kb).



7 Subset by gene and enhancer features

If any gene or enhancer features are provided versions faceted by these features of the PR curves, predictor vs experiment and effect size plots are created.

7.1 Precision-recall curves

7.1.1 Gene features

7.1.2 Enhancer features

7.2 Predictor scores versus experimental outcome

7.2.1 Gene features

7.2.2 Enhancer features

7.3 Effect size vs predictors

7.3.1 Gene features

7.3.2 Enhancer features


8 Correlation between predictors

How well predictor scores correlate with each other for E-G pairs in the experimental data is investigated.

K562


Correlation of scores between predictors for experimenta E-G pairs.

Correlation of scores between predictors for experimenta E-G pairs.



9 Properties of the experimental dataset

Different features of the experimental data are investigated.

9.1 Distance to TSS distribution

K562


Distance to TSS distributions for all E-G pairs in experimental data. E-G pairs are partitioned according to whether they were identified as enhancer-gene interactions (positives) or negatives.

Distance to TSS distributions for all E-G pairs in experimental data. E-G pairs are partitioned according to whether they were identified as enhancer-gene interactions (positives) or negatives.


9.2 Overlapping features

A plot showing the number of experimentally tested candidate enhancers overlapping provided genomic features. If no features were provided, this plot is not generated.


10 Sources